Parallel K-Medoids++ Spatial Clustering Algorithm Based on MapReduce

نویسندگان

  • Xia Yue
  • Wang Man
  • Jun Yue
  • Guangcao Liu
چکیده

Clustering analysis has received considerable attention in spatial data mining for several years. With the rapid development of the geospatial information technologies, the size of spatial information data is growing exponentially which makes clustering massive spatial data a challenging task. In order to improve the efficiency of spatial clustering for large scale data, many researchers proposed several efficient clustering algorithms in parallel. In this paper, a new K-Medoids++ spatial clustering algorithm based on MapReduce for clustering massive spatial data is proposed. The initialization algorithm to decrease the number of iterations is combined with the MapReduce framework. Comparative Experiments conducted over different dataset and different number of nodes indicate that the proposed K-Medoids spatial clustering algorithm provides better efficiency than traditional K-Medoids and scales well while processing massive spatial data on commodity hardware.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Rank-Based K-medoids Clustering Algorithm by a Specific P System

In this paper, a rank-based K-medoids algorithm by a specific P system is proposed, which exhibits novel aspect of applying membrane computing in clustering. The traditional K-medoids clustering result suffers sensitivity to initial medoids selected randomly. To conquer the defect, we introduce the rank based on similarity between pairs of objects for the initialization. As a biological computi...

متن کامل

Parallel K-Means Clustering Based on MapReduce

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel clu...

متن کامل

Parallel Multi-Swarm PSO Based on K-Medoids and Uniform Design

PAM (Partitioning around Medoid) is introduced to divide the swarm into several different subpopulations. PAM is one of k-medoids clustering algorithms based on partitioning methods. It attempts to divide n objects into k partitions. This algorithm overcomes the drawbacks of being sensitive to the initial partitions in kmeans algorithm. In the parallel PSO algorithms, the swarm needs to be divi...

متن کامل

Parallelising the k-Medoids Clustering Problem Using Space-Partitioning

The k-medoids problem is a combinatorial optimisation problem with multiples applications in Resource Allocation, Mobile Computing, Sensor Networks and Telecommunications. Real instances of this problem involve hundreds of thousands of points and thousands of medoids. Despite the proliferation of parallel architectures, this problem has been mostly tackled using sequential approaches. In this p...

متن کامل

MapReduce K-Means based Co-Clustering Approach for Web Page Recommendation System

Co-clustering is one of the data mining techniques used for web usage mining. Co-clustering Web log data is the process of simultaneous categorization of both users and pages. It is used to extract the users’ information based on subset of pages. Nowadays, the cyberspace is filled with huge volume of data distributed across the world. The business knowledge acquaintance from such a voluminous d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1608.06861  شماره 

صفحات  -

تاریخ انتشار 2016